Main
David Zhang
Bioinformatics software engineer with experience operating across the entire software development lifecycle. Skilled in prototyping and benchmarking innovative solutions, as well as implementing, testing, and integrating software into production-ready pipelines.
Work Experience
Senior bioinformatics engineer
London, UK (hybrid)
Present - 2024
- Optimised and scaled machine learning tools to extract actionable insights from single-cell Perturb-seq datasets comprising millions of cells. Performed analysis and integrating findings to inform strategic decisions and guide company direction.
- Designed and deployed a robust data pipeline that ingested, tidied and version-controlled data for the Neo4j knowledge graph. Automated the deployment of this graph via CI/CD using Terraform, enabling automated releases to AWS enhancing reproducibility and operational efficiency.
Senior bioinformatics software engineer
Hinxton, UK (hybrid)
2024 - 2022
- Developed scalable bioinformatics pipelines in Nextflow to process solid tumor sequencing data. Pipelines included alignment, variant calling, driver mutation annotation, and therapy matching, supporting clinical and translational applications.
- Built a suite of Python and R packages to automate the clinical verification process, enabling earlier detection and resolution of issues. This automation reduced verification time by from 1 month per quarterly release, significantly accelerating the development cycle.
Bioinformatician internship (2 months)
London, UK (remote)
2021
- Created a reproducible aberrant splicing detection pipeline using docker for drug target discovery in C9orf72 ALS patients.
Education
PhD, Bioinformatics
University College London
London, UK
2022 - 2017
- Thesis: Using transcriptomics to improve the genetic diagnosis rate of rare disease patients.
- Developed ggtranscript, an open-source R package for visualizing transcript structures, which has recieved 150+ stars on GitHub and 250+ citations.
MSc, Neuroscience
University College London
London, UK
2016 - 2015
- Grade: Merit (68%)
BSc, Biomedical science
University College London
London, UK
2015 - 2012
- Grade: 2:1 (69%)
Open-source software
Portfolio website
N/A
N/A
Present - 2022
- My website showcases some of my open source projects. The frontend is developed with HTML5, SCSS, and JavaScript, and the backend is powered by Django. It’s deployed for free using PythonAnywhere.
Python packages
N/A
N/A
2023 - 2021
- autogroceries: Use Selenium to automate your grocery shop.
- stravaboard: A dashboard for flexibly displaying and tracking Strava runs built using Streamlit.
R packages
N/A
N/A
2022 - 2020
- ggtranscript: Visualising transcript structure and annotation using ggplot2.
- dasper: Detection of aberrant splicing events in RNA-sequencing.
- dasper: Detection of aberrant splicing events in RNA-sequencing.
Selected Publications
A complete list of my publications is available via Google Scholar
ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2
Bioinformatics
N/A
2022
- Role: Co-first author
Developmental Consequences of Defective ATG7-Mediated Autophagy in Humans
The New England Journal of Medicine
N/A
2021
- Role: Co-first author
Megadepth: efficient coverage quantification for BigWigs and BAMs
Bioinformatics
N/A
2021
- Role: R package developer.
Incomplete annotation of disease-associated genes is limiting our understanding of Mendelian and complex neurogenetic disorders
Science advances
N/A
2020
- Role: First Author.